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InfiniBand Promises Greater Speed, 
Scalability For Servers And Clusters 



InfiniBand, a switch-fabric architecture, offers far greater bandwidth than the bus-structured 
archilBctures of the past it also eases redundancy and hot-pluggabillty problems. This 
network approach segments 2-Gbyte messages into 256- to 4096-byte packets. 




If bus architectures such as PCI and 
PCI-X could deliver the higher I/O 
clock rates and scalability essential 
for 21st century networks, an I/O 
architecture known as InfiniBand nev- 
er would have been hatched. In the 
early 1990s, 66-MHz processors and 
10-Mbit/s networks were considered 
state-of-the-art. Servers back then 
cranked out 54 transactions/ min., a 
fairly paltry figure by today's stan- 
dards. PCI and PCI-X were modeled to 
meet the needs of the the last decade. 
With their initial 133-Mbit/s band- 
width and subsequent increases to 
1.066 Gbits/s, they did. 

Look at where we are now. This 
March, Intel announced a Pentium III 
Xeon processor that runs at 1 GHz, As a 
result servers have sped past 135,000 
transactions/s. It's no wonder that an 
up to date I/O structure has become 
necessary. This has given rise to a new 
I/O structure, InfiniBand," whose roll- 
out is under way. Though still a fledg- 
ling specification, version 1.0 should 
launch this month. 

InfiniBand is a network approach, 
rather than a bus approach, to I/O 
architecture (see the figure). The key 
components in an InfiniBand network 
are the switches, the host channel 
adapters (HCAs), and the target-chan- 
nel adapters (TGAs). 

The switch is a relatively simple 
device. It forwards dual-simplex, 8- 
bit/10-bit encoded packets. These 
packets are based on two fields that 
they contain, known as a destina- 
tion/local ID and a service-level field. 
Messages of up to 2 Gbytes are seg- 
mented into packet lengths ranging 
from 256 to 4096 bytes, depending 
upon the application. 

If a reliable protocol is selected, it can 
be guaranteed that every packet will be 
delivered once and only once, in order, 



uncorrupted. Users are notified if that's 
not possible Once packets are sent out, 
they're reassembled at the far end to 
complete the transaction. 

Aggregate bandwidths are 500 
Mbits/s, 2 Gbits/s, and 6 Gbits/s with a 
2.5-Gbit/s signaling rate. The Infini- 
Band Trade Association (IBTA) hopes 
to eventually boost performance to a 
higher signaling rate. Bandwidths are 
scaled depending on how links are 
aggregated— by 1, by 4, or by 12. Serial 
links are traditionally described in 
bits/s, rather than in bytes/s, as is the 
case with parallel links. 

Each InfiniBand width drives 2.5 
Gbits/s (250 Mbits/s) in each direction. 
A 4-wide architecture is 10 Gbits/s (1 
Gbit/s per direction). A 12-wide archi- 
tecture is 30 Gbits/s (3 Gbits/s per 
direction). From a physical perspective, 



links may be copper or optical. They 
will be able to able to drive 20 in. of 
printed wiring, or 17 m of copper cable, 
while maintaining a worst-case bit error 
rate of at least 1 x 10" 12 . 

Room For Innovation 

On the software side, the IBTA want- 
ed to leave a lot of room for applica- 
tions vendors to innovate. Instead of 
defining an absolute applications pro- 
gramming interface (API), the IBTA cre- 
ated an abstraction called "Verbs.'' This 
innovation defines the functionality 
that an HCA has to provide. Applica- 
tion vendors, then, know what services 
are going to be supported. But they're 
still free to develop individual inter- 
faces, optimizing them for a particular 
operating system. 

From a management standpoint, 
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one node/switch must emerge as a 
subnet manager. It can reside in a node 
or in an HCA. Or, it may be integrated 
as part of a switch. Hie subnet manag- 
er is responsible for assuring conduc- 
tivity throughout the fabric. It does 
this by sending management data- 
grams. Every InfiniBand device that 
participates on the fabric has a subnet 
management agent 

Also, InfiniBand supports unan- 
nounced hot-swapping. Designers can 
just walk up to a module and pull it 
out. The subnet manager will automati- 
cally detect this event 

The IBTA, comprising over 180 com- 
panies, came into being in August 1999 
as a confluence of two earlier groups: 
the Next Generation I/O (NGIO) led by 
Intel, and the Future I/O led by IBM, 
Compaq, and Hewlett-Packard. 

The problems that brought Infini- 
Band into being are based heavily on 
the requirements of servers and clus- 
ters of servers, sometimes dubbed 
"farms.* Bus architectures lack suffi- 
cient headroom. Their capabilities are 
strained by the voracious demands for 
data transfers, and in particular, on the 
Internet and the higher I/O data rates 
required. 

There is a real crunch at the data cen- 
ters," says Jean S. Bozman, research 
director, Commercial Systems and 
Servers, International Data Corp., 
Framingham, Mass. Bozman spoke at 
August's Intel Development Forum in 
San Jose; Calif. 

"A high-speed interconnect such as 
Infiniband is going to promote flexi- 
bility in computer system design. 
When we have these new, faster links 
we will be free. to move the server 
pieces farther apart— or arrange them 
in a little different way. Whereas before 
it has all been in the confines of a sin- 
gle cabinet or box," according to Boz- 
man. "It will also put an end to fork-lift 
upgrades," she adds, referring to the 
practice of removing and replacing 
servers, en masse, rather than upgrad- 
ing existing servers. "In fact, expand- 
able servers will enable capacity 
upgrades on-the-fly." 

OEMs looking to participate in the 
development of InfiniBand-based 
products have a number of opportuni- 
ties. Bozman advises vendors to iden- 
tify early on specific market segments 
that they believe will adopt Infini- 
Band. Then, they need to develop 
plans for phased InfiniBand rollouts 
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by working with software vendors to 
make sure key applications use Infini- 
Band APIs. 

"Building the InfiniBand infrastruc- 
ture is going to be kind of a layered 
approach with, at first a lot of the tech- 
nology coming in at the edges," Boz- 
man predicts. She sees InfiniBand arriv- 
ing in concert with the move in servers 
from 32-bit computing to 64-bit com- 
puting, pointing out that the 64-bit ver- 
sions will support both 64-bit as well as 
the large inventory of 32-bit applica- 
tions now in place 

Early InfiniBand components will 
most likely be bridge chips and add-on 
cards for connecting with existing prod- 



ucts via existing I/Os. Enlarged support 
for clustering and server farms will 
arrive in 2001, tying together existing 
systems with InfiniBand-type cluster- 
ing. Full-blown symmetrical multi- 
processor servers (SMPs) embodying 
InfiniBand will probably emerge fur- 
ther down the line. 

Some issues remain to be answered, 
though. For example, it's unclear if 
InfiniBand will complement or com- 
pete with bus architectures, such as the 
well-entrenched PCI and its successor, 
PCI-X, which is less than a year old. 

For more details, go to the group's 
web site at urww.infintbandta.oig. 

Steve Grossman 



Updated Tool Does Kernel-i 
Debugging For Real- 



The Linux Trace Toolkit (LIT) now 
supports kernel-level debugging. It 
has been available for application- 
level debugging for some time; but real- 
time developers had to contend with 
basic debugging tools. This version of 
LTT also supports the Linux hard real- 
time application interface (RDM). 

LIT is an open-source project sup- 
ported by Opersys Inc. of Montreal, 
Canada, and Lineo Inc. of Lindon, 
Utah. It's distributed under the GNU 
General Public license making it freely 
available to developers. As a graphical 
tool, it dynamically displays system per- 
formance information (see the figure). It 
can be used to determine what process 
was accessing hardware in a particular 



LTT's graphical interface now provides execution details about 
kernel applications, as shown in this event-flow graph. 



time slice. Also, it can highlight I/O 
device-driver latencies or application 
dependence on device drivers. It's espe- 
cially useful in analyzing syncrhoniza- 
tion problems. 

Converting regular Linux applications 
to hard real-time applications under 
RTAI requires one call, rt_make_ 
hard_real_time(). But making sure the 
application does what it's intended to 
do under these constraints is now 
much easier. LTT support currently 
works in a single-processor environ- 
ment. The support hooks have not been 
added in the multiprocessor Linux ker- 
nel. This is a limitation with REM, LTT, 
and the multiprocessing support LTT 
already works in this environment for 
regular applications. 

A minor change to 
the LTT trace-file for- 
mat makes it incom- 
patible with prior ver- 
sions, though. On the 
plus side, the binary 
format is smaller, 
which is very handy 
because tracing over 
long periods of time 
can generate rather 
large files. The binary 
encoding also im- 
proves performance. 

For more informa- 
tion about the LTT, 
visit www. opersys. com/ 
LIT/. 
William Wong 
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